MESPool: Molecular Edge Shrinkage Pooling for hierarchical molecular representation learning and property prediction

Abstract Identifying task-relevant structures is important for molecular property prediction. In a graph neural network (GNN), graph pooling can group nodes and hierarchically represent the molecular graph. However, previous pooling methods either drop out node information or lose the connection of the original graph; therefore, it is difficult to identify continuous subtructures. Importantly, they lacked interpretability on molecular graphs. To this end, we proposed a novel Molecular Edge Shrinkage Pooling (MESPool) method, which is based on edges (or chemical bonds). MESPool preserves crucial edges and shrinks others inside the functional groups and is able to search for key structures without breaking the original connection. We compared MESPool with various well-known pooling methods on different benchmarks and showed that MESPool outperforms the previous methods. Furthermore, we explained the rationality of MESPool on some datasets, including a COVID-19 drug dataset.


INTRODUCTION
Molecular property prediction is a fundamental task in drug discovery, and plays a crucial role in computer-aided drug discovery workf lows [1] because many methods rely on predicted molecular properties to evaluate, select and generate molecules [2].In recent years, AI-driven molecular property prediction methods have become a hot spot in the field of lead compound discovery and optimization [1,3].At the same time, graph neural networks (GNNs) have shown their power in graph representation learning [4,5], and have been further applied to molecular graph data [6,7].Traditional machine learning methods require handcrafted molecular features, such as molecular fingerprints and descriptors [8,9].By comparison, GNNs learn high-dimensional embeddings of atoms and bonds through message passing (or graph convolution) end to end and therefore represent the graph structures and interactions of molecules.
A general message passing scheme smooths node signals across the graph by an aggregation operation and implicitly learns the structural information.However, it is difficult to distinguish task-related structures and unrelated parts [10], or hierarchically represent the graph.To address these problems and obtain refined graph local representations, local pooling layers can be inserted into typical GNNs, similar to convolutional neural networks (CNNs) [11].Graph local pooling operations hierarchically reduce the graph representation and preserve the local structural information of interest.
Previous graph pooling methods can be grossly classified into two categories: sparse pooling [12][13][14] and dense pooling [15][16][17][18], according to their node selection method.Liu et al. [19] listed and summarized twenty representative graph pooling methods.The characteristics of sparse pooling and dense pooling can be summarized as follows: • Sparse pooling: (i) Sparse pooling aims to preserve the task-related nodes and drop out the unrelated nodes in each layer.(ii) Some structural information will be lost in the process of node dropping, but the original connection will be preserved.(iii) A threshold (minimum score) can be set to adaptively adjust the number of pooled nodes.(iv) The number of parameters will not increase with the increasing graph size.
• Dense pooling: (i) Dense pooling aims to cluster nodes and hierarchically represent the graph in each layer.(ii) A fixed ratio or number of pooling clusters is always needed.(iii) Dense pooling does not drop the node information, but the rebuilt connection of the pooled graph is not strongly related to the original graph.(iv) The number of parameters is related to the size of the graph.Thus dense pooling requires much more computational resources than sparse pooling on large graphs.
Specifically, most pooling methods are hardly interpretable, yet interpretability is critical in drug-and medical-related tasks [19].
To this end, we proposed a novel graph pooling method called Molecular Edge Shrinkage Pooling (MESPool).MESPool is inspired by the concept of molecular scaffold' and functional group', and MESPool aims to gradually shrink the scaffold and functional groups into supernodes in a similar way to the scaffold tree [20].With the deepening of the network, the crucial edges are preserved, and the graph is simplified into the connection of functional groups.Different from the previous methods, we regard edges (chemical bonds) as the basic pooling unit instead of nodes (atoms).MESPool scores edges with their features and uses a threshold to adaptively adjust the pooling proportion.As a consequence, we also proposed a convolution operator edgefeatured graph isomorphism network (EGIN), which introduced edge feature updates based on a graph isomorphism network (GIN) [21].
MESPool has the following advantages: (i) It has the ability to distinguish the task-related structures and hierarchically represent the graph.(ii) It maintains the original graph connection without missing any node information.(iii) The pooled graph contains original nodes and supernodes, where supernodes represent disjoint substructures.(iv) The number of parameters is fixed and will not increase with the increasing of the graph size.(v) More importantly, the pooling result of MESPool can provide good rationality and further serves as a valuable guide for interpretability, as discussed in the Results.The neighbor node set of node i. n The number of nodes.m The number of edges.
The dimension of a node feature vector.

d e
The dimension of an edge feature vector.
Feature vector of node i. h i→j ∈ R de Feature vector of directed edge i → j. h e(i,j) ∈ R de  Feature vector of undirected edge e(i, j).
Feature matrix of all nodes in V.

Graph neural networks
A graph can be represented as G = (V, E), where the node set V contains |V| = n nodes and the edge set E contains |E| = m edges.
In addition, an adjacency matrix A ∈ {0, 1} n×n is normally used to describe the connections in a graph (We listed the common notations and their descriptions in Table 1).The main idea of GNNs is to learn node feature representation by iteratively aggregating node features from neighbors and integrating the aggregated information with the central node representation.A general GNN scheme (or a graph convolution layer) can be formalized as a message passing (MP) function: where h (l)   i ∈ R dv is the representation of node i at the lth GNN layer.AGGREGATE (l) and UPDATE (l) denote the functions of the aggregation operation and update operation at the l th GNN layer, respectively.Finally, an entire graph representation h (l)  G ∈ R d can be obtained by a readout (global pooling) function: Various GNNs have been proposed in recent years.Kipf and Welling [22] proposed the graph convolution network (GCN), which simplified the approximation of the graph Laplacian using the Chebyshev expansion method [23].GraphSAGE [24] learns node embeddings through sampling and aggregation.graph attention network (GAT) [25] introduced attention mechanisms to calculate the weight of nodes while propagating.In addition, Xu et al. [21] discussed the design principles of aggregation and update operations and proposed the graph isomorphism network (GIN).

Graph pooling
Graph local pooling (hereinafter referred to as graph pooling) layers allow GNNs to obtain graph local structural information hierarchically by reducing the number of nodes.Grattarola et al. [26] summarized the graph pooling operator as the combination of three functions: selection, reduction and connection (SRC).
The selection function groups nodes into subsets (dense pooling) or just selects the important nodes (sparse pooling); then, the reduction function aggregates subsets into supernodes or just deletes the uninterested nodes; finally, the connection function relinks the reduced nodes and outputs a pooled graph.
Sparse pooling exploits learnable scoring functions to delete nodes with lower significance scores.As a representative sparse pooling method, TopK [12,13] scores nodes based on a learnable projection vector, keeps high-scoring nodes and drops out lowscoring nodes.SAGPool [14] improves TopK by using a GNN to score nodes to consider both node features and graph topology.
Dense pooling considers graph pooling as a node clustering problem, and it groups nodes into a fixed number of clusters by computing a cluster assignment matrix.DiffPool [15] and MinCut [16] use GNN and MLP to compute the cluster assignment matrix, respectively, and constrain the rationality of clustering with regularization terms.The atom is or is not part of an aromatic system. 1

Hs
Number of bonded hydrogen atoms.5

Formal charge
The electronic charge assigned to the atom.5

Chirality
The chirality type of the atom.4

Hybridization
The hybridization form or the atom.5

Bond features 13 Bond type
Type of bond.5

Conjugation
The bond is or is not conjugated. 1

In ring
The bond is or is not part of a ring. 1

Stereo
The stereo type of the bond.6 e(i,j) ); (B) an example of edge updating (Edge i → j and j → i update with their starting node i and j) (undirected edges convert into directed edges); (C) a central node (i) update with its neighbor edges (j → i and k → i).
In addition, there are some other interesting pooling methods.ASAP [27] is a mixed method that introduces a self-attention mechanism Master2Token into the general dense pooling process to consider the information inside the clusters.Furthermore, ASAP scores clusters and drops out the low-scoring clusters.Unlike the other methods, EdgePool [28,29] is a distinctive hard pooling method; it scores edges, traverses the graph and contracts half of the edges that are high-scoring but nonadjacent.
The SRC functions of the above methods are listed in Table 2 as baseline methods.Besides these classic methods, some new methods also presented intriguing ideas.HGP-SL [30] is a sparse pooling method, which selects nodes that are more representative of their neighbors by calculate the Manhattan distance between nodes.Haar graph pooling [31] applies the Haar basis system to compress the graph, it generates groupings of nodes following a series of clustering methods.TAP [32] selects important nodes by a two-stage voting (local and global) process to consider the topology of the graph.However, these methods still follow a comparable framework to the sparse and dense pooling methods commonly used.
A diagrammatic sketch of the comparison between MESPool and sparse/dense pooling is shown in Figure 1.The core idea of MESPool is to discriminate the edges within functional groups and the edges connecting the functional groups, and abstract the molecular graph into the connection of functional group supernodes.MESPool divides the graph in a way conforming to chemical intuition and preserves the original connection relationship of the graph.

Edge-featured Graph Isomorphism Network
Edges with their features play an essential role in many realworld graph data.For molecular graphs, edge features describe chemical bond type, conjugation, ring and stereo information (the initial atom and bond features are listed in Table 3).In addition, an edge can represent the smallest substructure in the graph with two connected nodes, and we call this unit the smallest pooling substructure in our MESPool.To update and propagate the edge features, we propose the EGIN, which alternately updates edge features and node features based on the framework of the GIN.
The nodewise formulation of the GIN can be described as [21] h (l)  i = MLP where is a learnable parameter that adjusts the weight of the central node.The adjacent node messages are aggregated through Edge selection can distinguish the connected substructures better in this case; (D) special cases of shrinkage, repeated edges will be added into one edge: (i) a ring preserving two adjacent edges will be converted into two nodes connected with two edges after shrinkage; (ii) a ring preserving one edge will be converted into a supernode with two self-loops after shrinkage, since the original preserved edge is directed.
a summation function, and the central node feature is then updated by a multilayer perceptrons.
In an EGIN layer, the edge feature will be updated first by concatenating its starting node feature, and undirected edges will be converted to directed edges through this operator (Figure 2A,B): where h (l) i→j ∈ R de denotes the feature of directed edge i → j at the lth layer, W is a learnable transformation matrix and b is a learnable bias.Since the updated edge feature h (l) i→j contains the information of directed edge i → j and the starting node i, we can treat h (l) i→j as the weighted message of neighbor node i to central node j.Therefore, the aggregation function becomes the summation of neighboring edge features, and the node update operator can be written as follows: We use the concatenate function to combine the central node feature h (l−1) i and the aggregated message k∈N (i)\j h (l) k→i because they are different kinds of features and may have the different dimensions.
In the following sections, we simply express the graphwise EGIN process as follows:

Molecular Edge Shrinkage Pooling
In this section, we describe the components of our MESPool following the SRC scheme.The main idea of MESPool is to preserve crucial edges and shrink other structures into supernodes (see Figure 3A, B).Unlike the previous methods, MESPool initially selects edges (units) to pool rather than nodes.Edge selection can distinguish the connected substructures better than node selection in this framework(see Figure 3C), which is helpful for hierarchical representation.MESPool can be seen as the mixture of sparse and dense pooling, which select and split the units by scoring like sparse pooling, instead of dropping out low scored nodes (units), MESPool reduces their clusters into supernodes like dense pooling.In such a process, MESPool not only maintains the connection of the original graph, but also does not lose any node information.In addition, since the scaffolds are shrunk into nodes, the distance between side chains is shortened on the pooled graph, as the combination and interaction of functional groups can be represented in the deeper network after pooling.

Selection: threshold splitting
In the selection process, we score the undirected edges to represent the weight of the corresponding units, and the adjacent low-scored units will be considered as a subset (pooling substructure).Therefore, it is necessary to consider the adjacency information when scoring an edge.Initially, we adopt an edge message propagation in the selection process: Note that the directed edge features here have contained the information of the starting nodes, thanks to the edge update operator in the EGIN layer before the pooling layer.Consequently, the undirected edge feature h e(i,j) can represent the information of the corresponding unit and its neighboring units.The scoring function is a linear transformation on h e(i,j) : s e(i,j) = σ s h e(i,j) (8) where s e(i,j) is the score of undirected edge e(i, j), s ∈ R de is a learnable scoring operator and σ denotes the sigmoid function.The edge features can be further updated by the score: According to the edge score, units can be divided into two groups by a manually set hyperparameter λ.In addition, the average value of edge scores s mean is used to control the number of pooling units and ensure that the layer will not pool all the nodes at once.Consequently, we have a threshold that can be denoted as the minimum value of λ and s mean : threshold = MIN(λ, s mean ) (10) The pooling edge and node set can be further described as follows:

Reduction: unit shrinkage
After the pooling set is split out, a strongly connected component finding algorithm [33], CONNECTED, is used to group the connected pooled nodes into subsets: where V subset(•) is a node set representing an independent pooled substructure, K is the calculated number of subsets in the layer, and we have The pooled edge and node set E pool and V pool represents a sparse graph with multiple discrete subgraphs V subset(1) , . . ., V subset(K) ; therefore, a GNN can be used to learn the representations of subgraphs.Here, we apply the EGIN on E pool and V pool : A normal node featured GNN can also be usable here, since only the output node features will be used in the subsequent process.The subsets can be further shrunk into supernodes with the updated node features: where h i is the ith row of H V pool and super(k) denotes the k th supernode.Since pooled substructures are shrunk into supernodes, their inf luence on the whole graph will be reduced in the deeper network, and we consider the pooled part to be task unrelated.

Connection: preserved connection optimization
After reduction, the node set of the new graph can be described as follows: In several special cases, there will be some node pairs in V new connected with multiple edges or one node with multiple self-loops.Therefore, we optimize the connection by summing the repeated edges that link the same nodes into one edge (see Figure 3D): Eventually, the new graph after pooling can be denoted as follows:

Experiment
We evaluate the proposed MESPool and previous pooling methods on molecular classification and regression tasks.In this section, we describe the model architecture, benchmark datasets and training strategy and finally summarize the results of our experiments.

Model architecture
In the experiment, we adopt a unified model architecture [14] (Figure 4A) for all baseline pooling methods ( Finally, the output hierarchical graph representations are concatenated and passed to a linear layer for classification.In addition, a pure GNN architecture without hierarchical pooling layers is also used to test the GIN and EGIN as a comparison (Figure 4B).

Datasets
Seven classification and three regression benchmark datasets are selected in our experiment.BACE [34] is a dataset of molecules that provides quantitative IC50 and qualitative (binary label) binding results for a set of inhibitors of human beta-secretase 1 (BACE-1).BBBP [35] includes binary labels for over 2000 compounds on their permeability properties.The HIV dataset contains over 40 000 compounds with binary labels representing the ability to inhibit HIV replication.MUV [36] is a benchmark dataset containing 17 challenging tasks for approximately 90 000 compounds that were selected from PubChem BioAssay.SIDER [37,38] groups drug side effects into 27 system organ classes and contains over 1400 approved drugs.The Tox21 dataset contains over 8000 compounds and their qualitative toxicity measurements on 12 different targets.ClinTox [39,40] compares drugs approved by the FDA and drugs that have failed clinical trials for toxicity reasons, encompassing two classification tasks and over 1400 drugs.The ESOL (Delaney) [41] dataset is a regression dataset containing structures and water solubility data for 1128 compounds.The FreeSolv [42] dataset is a collection of experimental and calculated hydration free energies for small molecules in water, along with their experimental values.The Lipophilicity dataset curated from ChEMBL database, provides experimental results of octanol/water distribution coefficient (logD at pH 7.4) of 4200 compounds.

Training strategy
The initial atom and bond features are listed in Table 3, and all the methods adopt the same initial featurization.In the experiment, we adopt the random scaffold splitting [43,44] procedure to split the dataset (training:validation:test = 8:1:1), which splits the molecules according to their scaffold, making the prediction more challenging than random splitting.We take 5 random seeds to split each dataset and apply 10 independent runs for each split.A total of 50 testing results were used to report the mean and standard deviation of the performance.In fairness to all pooling methods, we adopt the same early stop criterion, patience and Adam optimizer.The relevant values were set to be suitable and tolerant for all methods based on pre-experiment.The hyperparameters (including hidden dimensions, pooling ratio, dropout ratio and learning rate) are independently tuned for each method by grid searching.

Benchmark results
Table 4 shows the performance of MESPool on seven classification tasks in comparison to baseline models.The results are evaluated by the area under receiver operating characteristic curve (AUC-ROC) and the area under precision-recall curve (AUC-PRC).Table 5 is the benchmark results on three regression tasks evaluated by the root mean squared error (RMSE).Overall, MESPool outperforms other pooling methods on most datasets.Most baseline pooling methods have no significant performance improvement compared with the GIN network without pooling layers, especially BACE, MUV and HIV.Additionally, comparing the models with GIN and EGIN, the performance of SAGPool improved on both seven datasets when using EGIN, while the performance of TokPool and EdgePool had no significant change.In addition, introducing the edge features did not make the performance of EGIN ahead of GIN; in contrast, EGIN has a very poor performance on Clintox.By comparison, MESPool can always maintain a decent performance.This shows that the advantage of MESPool is not due to directly introducing the edge feature but to the special design of the pooling layer.

Rationality analysis
In this section, we choose two benchmark datasets (BACE and HIV) and a novel dataset of potential drugs against SARS-CoV-2 and give some examples to analyze the rationality of MESPool.We use some reported drugs to predict their properties, visualize their pooling results and compare them with their structure design principle.Additionally, we give some pooling examples on two commonly used tasks (ESOL and Mutagenicity) to explain the rationality on functional group identification.
Here, we take two molecules for each dataset to describe in this section.In general, the first pooling layer tends to identify and shrink the ring structure.With the deepening of the network, more core regions are found, especially the recognition of connection structures such as acetamide.Finally, the molecule is divided into several key functional groups, and the chemical bonds between them are retained (details below).
The two molecules have a common isophthalamide scaffold (structure B).GRL-8234 has a 3-methoxybenzyl group at P1 (which is critical for the enhanced cellular inhibitory properties) and a phenylalanine side chain at P1'.On the other hand, 5HA has a cyclopropyl moiety oriented toward the S1' subpocket of the enzyme active site.The pooling visualization results show that the first layer identified four key ring structures, and in the deeper layers, supernodes expand from the rings.Finally, the functional groups that make hydrophobic contacts in BACE-1 binding pockets are recognized and shrunken.
Significantly, the BACE dataset has strict requirements for qIC 50 values, it is challenging to accurately predict GRL-8234 and 5HA.We specifically focus on GRL-8234 as a representative example (Figure 6), visualized the pooling results of TopK and SAG, to demonstrate the superior rationality of our method.The baseline methods both mistakenly predicted the drug GRL-8234 with a score below 0.5.As the network deepens, the loss of structural information becomes increasingly pronounced, leaving behind sparse structures that lack chemical rationality.In contrast, we firmly believe that the accuracy of our predictions stems from the rationality of our structural selection, which is the very aspect distinguishes our method from others.

Human immunodeficiency virus
Human immunodeficiency virus (HIV) is one of the main causes of morbidity and mortality worldwide [50].The HIV dataset collects the information on the ability of compounds to inhibit HIV replication, and the labels show the HIV activity: confirmed active and moderately active are labeled 1, and confirmed inactive is labeled 0. Highly active antiretroviral therapy (HAART) is recognized as the most effective treatment method for AIDS, and protease inhibitors play a very important role in HAART [51].Here, we selected two protease inhibitors approved by the FDA, Darunavir [51,52] and Atazanavir [51,53], to discuss the rationality of the pooling results (Figure 7).
Darunavir has a similar structure to amprenavir; they both have a benzyl group at the P1 site, and an isobutyl group at P1' connects the phenyl amide P2' group by a sulfonamide.The main design of Darunavir is a bicyclic tetrahydrofuran (bis-THF) at the P2 site, which can effectively hydrogen bond with both Asp-29 and Asp-30 NHs present in the S2 subsite [52].These feature functional groups are basically identified during the pooling process.Atazanavir exhibits potent anti-HIV activity, and a unique structural characteristic is the presence of a large phenylpyridyl P1 group that is asymmetric relative to its benzyl P1' group.The symmetrical and asymmetrical structures of atazanavir are found at the beginning of pooling.Overall, the compounds are finally divided from the vicinity of acetylamine, which is a similar binding pattern of FDA-approved HIV protease inhibitors [51].
Remdesivir and VV116 are both RNA-dependent RNA polymerase (RdRp) inhibitors, and they are designed on the nucleoside analog core GS-441524.GS-441524 is a prodrug that is able to diffuse into cells and slowly convert into nucleoside monophosphate via phosphorylation and further processed into an active nucleoside triphosphate derivative with phosphokinase to inhibit RdRp.Remdesivir is the monophosphate of GS-441524, and the additional functional groups accelerate the phosphorylation process [56].On the other hand, the tri-isobutyrate ester VV116 obtained good oral bioavailability through the esterification of 7deuterated GS-441524 [58].We can see from Figure 8 (C) that the pooling layers identified the nucleoside analog core on Remdesivir and VV116.Furthermore, the monophosphate structure is preserved on remdesivir, and the added benzyl and polar/nonpolar mixed functional group are shrunk respectively.In addition, three isobutyryl groups on VV116 were also identified and shrunk.

Functional group indetification
Water solubility and mutagenicity are both important and widely studied tasks.Wu et al. [59] proposed a GNN based structureactivity relationship (SAR) mining method named substructure mask explanation (SME), which is based on well-established molecular segmentation methods (BRICS substructures [60], Murcko substructures [61] and manually set functional groups).It analyses task-related structures through the voting of consensus model.Our pooling method shows similar results to these molecular segmentation methods.
The RMSE result of ESOL with random splitting after 10 independent runs is 0.748 + / − 0.022.The Mutagenicity dataset in [59] contains 7672 compounds and 1 binary label, and the AUC-ROC result is 0.913 + / − 0.008.Figure 9 shows the pooling results of 4 instance compounds on the two tasks.The results indicate a pattern that closely resembles BRICS substructure segmentation, particularly for compound 1 and 3.For the solubility task, MESPool identifies multiple functional groups that are consistent with the known chemical knowledge, including hydroxyl, pyrimidine, isopropyl and cyclopropane.In the mutagenicity prediction task, toxicophores such as aromatic nitro, aromatic amine, quinones and polycyclic aromatic system are combined, and separate from the detoxifying group carboxylic acid.

CONCLUSION
We proposed a novel graph pooling method for molecule representation learning called MESPool, which reduces structures by selecting edges and is able to adaptively adjust the pooling ratio.
The biggest difference from previous methods is that MESPool can search task-relevant structures directly on the original graph, which makes MESPool rational.In this study, we train the threeblock network for molecular property prediction end-to-end without any prior information.MESPool shows better performance than the baseline methods.Meanwhile, the pooling results have shown a good chemical intuition and are consistent with the drug design logic.
Learning to find the key substructures is an important subject in many molecular-related tasks.In addition to property prediction, there are also drug-drug interactions [62], drug-protein interactions [63] and molecule generation [64].We are looking forward to applying the idea of MESPool to more research fields.We believe it is also an interesting study direction in the future to strengthen and stabilize the interpretability of MESPool by introducing chemical prior information through pretraining and other methods, and further enable the identification of novel functional groups.

Key Points
• We proposed a novel edge-based graph pooling method called MESPool, it shows better performance than previous methods on molecular property prediction tasks.
• MESPool identifies task-relevant substructures in a graph, making it rational for molecular representation learning.• Comparing to previous methods, MESPool maintains the original graph connection without missing any node information.• We introduced edge feature updating based on the framework of the GIN, that alternately updates edge features and node features.

Figure 2 .
Figure 2. The illustration of EGIN process.(A) The features of nodes and edges (initially, h (0)

Figure 3 .
Figure 3. (A)The pooling process of a MESPool layer; (B) the concept illustration of a three-layer MESPool network; (C) the difference between edge selection and node selection: In our architecture, the selected edges/nodes will be clustered and pooled into supernodes according to their connection.Edge selection can distinguish the connected substructures better in this case; (D) special cases of shrinkage, repeated edges will be added into one edge: (i) a ring preserving two adjacent edges will be converted into two nodes connected with two edges after shrinkage; (ii) a ring preserving one edge will be converted into a supernode with two self-loops after shrinkage, since the original preserved edge is directed.

Figure 4 .
Figure 4. Model architecture with pooling layers (A) and without pooling layers (B).

Figure 5 .
Figure 5. Pooling visualization results of the BACE1 inhibitors GRL-8234 and 5HA (highlight colors are only used to distinguish supernodes, not representing any properties).

Figure 7 .
Figure 7. Pooling visualization results of the HIV inhibitors Darunavir and Atazanavir (highlight colors are only used to distinguish supernodes, not representing any properties).

Figure 8 .
Figure 8. Original structure of the COVID-19 drugs Remdesivir, VV116 and GS-441524; and the pooling visualization of Remdesivir and VV116 (highlight colors are only used to distinguish supernodes, not representing any properties).

Figure 9 .
Figure 9. Pooling visualization results of four instance compounds on two tasks, ESOL and Mutagenicity (highlight colors are only used to distinguish supernodes, not representing any properties).

Table 1 :
Common notations used throughout this paper GA mathematical graph.V Set of nodes.E Set of edges.N (i) H E ∈ R m×de Feature matrix of all nodes in E.

Table 2 :
SRC functions of the baseline pooling method See Methods for details of MESPool framework.

Table 5 :
Regression benchmark results Note: Bold denotes the lowest mean value; underline denotes the lowest standard deviation.

Table 2
) and MESPool to perform a horizontal comparison.The architecture contains three blocks, and each block consists of a graph convolution layer and a graph pooling layer.Since the previous pooling methods are node-based, we use GIN as the default convolution layer.Additionally, we add a control group with EGIN for TopK, SAGPool and EdgePool as a supplement, since they are able to process edge features.We use EGIN as the convolution layer of MESPool because MESPool relies on edge features.We apply a summation readout function at the end of each block, aggregating the node features to obtain the hierarchical graph representation: